55 research outputs found

    FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences

    Get PDF
    Background: Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. Findings: FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. Conclusions: The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data

    MetaBar - a tool for consistent contextual data acquisition and standards compliant submission

    Get PDF
    Background: Environmental sequence datasets are increasing at an exponential rate; however, the vast majority of them lack appropriate descriptors like sampling location, time and depth/altitude: generally referred to as metadata or contextual data. The consistent capture and structured submission of these data is crucial for integrated data analysis and ecosystems modeling. The application MetaBar has been developed, to support consistent contextual data acquisition. Results: MetaBar is a spreadsheet and web-based software tool designed to assist users in the consistent acquisition, electronic storage, and submission of contextual data associated to their samples. A preconfigured Microsoft Excel spreadsheet is used to initiate structured contextual data storage in the field or laboratory. Each sample is given a unique identifier and at any stage the sheets can be uploaded to the MetaBar database server. To label samples, identifiers can be printed as barcodes. An intuitive web interface provides quick access to the contextual data in the MetaBar database as well as user and project management capabilities. Export functions facilitate contextual and sequence data submission to the International Nucleotide Sequence Database Collaboration (INSDC), comprising of the DNA DataBase of Japan (DDBJ), the European Molecular Biology Laboratory database (EMBL) and GenBank. MetaBar requests and stores contextual data in compliance to the Genomic Standards Consortium specifications. The MetaBar open source code base for local installation is available under the GNU General Public License version 3 (GNU GPL3). Conclusion: The MetaBar software supports the typical workflow from data acquisition and field-sampling to contextual data enriched sequence submission to an INSDC database. The integration with the megx.net marine Ecological Genomics database and portal facilitates georeferenced data integration and metadata-based comparisons of sampling sites as well as interactive data visualization. The ample export functionalities and the INSDC submission support enable exchange of data across disciplines and safeguarding contextual data

    Ecogenomic Perspectives on Domains of Unknown Function: Correlation-Based Exploration of Marine Metagenomes

    Get PDF
    Background: The proportion of conserved DNA sequences with no clear function is steadily growing in bioinformatics databases. Studies of sequence and structural homology have indicated that many uncharacterized protein domain sequences are variants of functionally described domains. If these variants promote an organism's ecological fitness, they are likely to be conserved in the genome of its progeny and the population at large. The genetic composition of microbial communities in their native ecosystems is accessible through metagenomics. We hypothesize the co-variation of protein domain sequences across metagenomes from similar ecosystems will provide insights into their potential roles and aid further investigation. Methodology/Principal findings: We calculated the correlation of Pfam protein domain sequences across the Global Ocean Sampling metagenome collection, employing conservative detection and correlation thresholds to limit results to well-supported hits and associations. We then examined intercorrelations between domains of unknown function (DUFs) and domains involved in known metabolic pathways using network visualization and cluster-detection tools. We used a cautious "guilty-by-association'' approach, referencing knowledge-level resources to identify and discuss associations that offer insight into DUF function. We observed numerous DUFs associated to photobiologically active domains and prevalent in the Cyanobacteria. Other clusters included DUFs associated with DNA maintenance and repair, inorganic nutrient metabolism, and sodium-translocating transport domains. We also observed a number of clusters reflecting known metabolic associations and cases that predicted functional reclassification of DUFs. Conclusion/Significance: Critically examining domain covariation across metagenomic datasets can grant new perspectives on the roles and associations of DUFs in an ecological setting. Targeted attempts at DUF characterization in the laboratory or in silico may draw from these insights and opportunities to discover new associations and corroborate existing ones will arise as more large-scale metagenomic datasets emerge

    A Membrane-Bound Vertebrate Globin

    Get PDF
    The family of vertebrate globins includes hemoglobin, myoglobin, and other O2-binding proteins of yet unclear functions. Among these, globin X is restricted to fish and amphibians. Zebrafish (Danio rerio) globin X is expressed at low levels in neurons of the central nervous system and appears to be associated with the sensory system. The protein harbors a unique N-terminal extension with putative N-myristoylation and S-palmitoylation sites, suggesting membrane-association. Intracellular localization and transport of globin X was studied in 3T3 cells employing green fluorescence protein fusion constructs. Both myristoylation and palmitoylation sites are required for correct targeting and membrane localization of globin X. To the best of our knowledge, this is the first time that a vertebrate globin has been identified as component of the cell membrane. Globin X has a hexacoordinate binding scheme and displays cooperative O2 binding with a variable affinity (P50∼1.3–12.5 torr), depending on buffer conditions. A respiratory function of globin X is unlikely, but analogous to some prokaryotic membrane-globins it may either protect the lipids in cell membrane from oxidation or may act as a redox-sensing or signaling protein

    Electron Transfer Function versus Oxygen Delivery: A Comparative Study for Several Hexacoordinated Globins Across the Animal Kingdom

    Get PDF
    Caenorhabditis elegans globin GLB-26 (expressed from gene T22C1.2) has been studied in comparison with human neuroglobin (Ngb) and cytoglobin (Cygb) for its electron transfer properties. GLB-26 exhibits no reversible binding for O2 and a relatively low CO affinity compared to myoglobin-like globins. These differences arise from its mechanism of gaseous ligand binding since the heme iron of GLB-26 is strongly hexacoordinated in the absence of external ligands; the replacement of this internal ligand, probably the E7 distal histidine, is required before binding of CO or O2 as for Ngb and Cygb. Interestingly the ferrous bis-histidyl GLB-26 and Ngb, another strongly hexacoordinated globin, can transfer an electron to cytochrome c (Cyt-c) at a high bimolecular rate, comparable to those of inter-protein electron transfer in mitochondria. In addition, GLB-26 displays an unexpectedly rapid oxidation of the ferrous His-Fe-His complex without O2 actually binding to the iron atom, since the heme is oxidized by O2 faster than the time for distal histidine dissociation. These efficient mechanisms for electron transfer could indicate a family of hexacoordinated globin which are functionally different from that of pentacoordinated globins

    NO Dioxygenase Activity in Hemoglobins Is Ubiquitous In Vitro, but Limited by Reduction In Vivo

    Get PDF
    Genomics has produced hundreds of new hemoglobin sequences with examples in nearly every living organism. Structural and biochemical characterizations of many recombinant proteins reveal reactions, like oxygen binding and NO dioxygenation, that appear general to the hemoglobin superfamily regardless of whether they are related to physiological function. Despite considerable attention to “hexacoordinate” hemoglobins, which are found in nearly every plant and animal, no clear physiological role(s) has been assigned to them in any species. One popular and relevant hypothesis for their function is protection against NO. Here we have tested a comprehensive representation of hexacoordinate hemoglobins from plants (rice hemoglobin), animals (neuroglobin and cytoglobin), and bacteria (Synechocystis hemoglobin) for their abilities to scavenge NO compared to myoglobin. Our experiments include in vitro comparisons of NO dioxygenation, ferric NO binding, NO-induced reduction, NO scavenging with an artificial reduction system, and the ability to substitute for a known NO scavenger (flavohemoglobin) in E. coli. We conclude that none of these tests reveal any distinguishing predisposition toward a role in NO scavenging for the hxHbs, but that any hemoglobin could likely serve this role in the presence of a mechanism for heme iron re-reduction. Hence, future research to test the role of Hbs in NO scavenging would benefit more from the identification of cognate reductases than from in vitro analysis of NO and O2 binding

    Ecological structuring of bacterial and archaeal taxa in surface ocean waters

    No full text
    The Global Ocean Sampling (GOS) expedition is currently the largest and geographically most comprehensive metagenomic dataset, including samples from the Atlantic, Pacific, and Indian Oceans. This study makes use of the wide range of environmental conditions and habitats encompassed within the GOS sites in order to investigate the ecological structuring of bacterial and archaeal taxon ranks. Community structures based on taxonomically classified 16S ribosomal RNA (rRNA) gene fragments at phylum, class, order, family, and genus rank levels were examined using multivariate statistical analysis, and the results were inspected in the context of oceanographic environmental variables and structured habitat classifications. At all taxon rank levels, community structures of neritic, oceanic, estuarine biomes, as well as other exotic biomes (salt marsh, lake, mangrove), were readily distinguishable from each other. A strong structuring of the communities with chlorophyll a concentration and a weaker yet significant structuring with temperature and salinity were observed. Furthermore, there were significant correlations between community structures and habitat classification. These results were used for further investigation of one-to-one relationships between taxa and environment and provided indications for ecological preferences shaped by primary production for both cultured and uncultured bacterial and archaeal clades

    Ecological structuring of bacterial and archaeal taxa in ocean surface waters.

    No full text
    The Global Ocean Sampling (GOS) expedition is currently the largest and geographically most comprehensive metagenomic dataset, including samples from the Atlantic, Pacific, and Indian Oceans. This study makes use of the wide range of environmental conditions and habitats encompassed within the GOS sites in order to investigate the ecological structuring of bacterial and archaeal taxon ranks. Community structures based on taxonomically classified 16S ribosomal RNA (rRNA) gene fragments at phylum, class, order, family, and genus rank levels were examined using multivariate statistical analysis, and the results were inspected in the context of oceanographic environmental variables and structured habitat classifications. At all taxon rank levels, community structures of neritic, oceanic, estuarine biomes, as well as other exotic biomes (salt marsh, lake, mangrove), were readily distinguishable from each other. A strong structuring of the communities with chlorophyll a concentration and a weaker yet significant structuring with temperature and salinity were observed. Furthermore, there were significant correlations between community structures and habitat classification. These results were used for further investigation of one-to-one relationships between taxa and environment and provided indications for ecological preferences shaped by primary production for both cultured and uncultured bacterial and archaeal clades
    corecore